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g£ (57) Abstract: The invention provides an artificial protein for quantitative analysis of the proteome of a sample, cell or organism, 
^ comprising al least two consecutive peptides linked by a cleavage sequence for separating the peptides; a singular marker on one 
fv| or more peptide for determination of the absolute amount of this fragment; and Ntcrminal and C-terminal extensions for protection 
a. of the peptides; wherein each peptide represents one single protein of the sample, cell or organism and each peptide is in a defined 

stoichiometry. The invention further provides a collection of peptides, a vector and a kit comprising the artificial protein and a 

method for quantitative analysis of the proteome. 
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Artificial protein, method for absolute quantification of proteins 

and uses thereof 

Field of the invention 

The present invention relates to proteomics and more specifically to absolute 
quantification of proteins. 

Background of the invention 

The need for absolute quantification in proteomics is becoming increasingly ur- 
gent. The most promising method is based on stable isotope dilution involving si- 
multaneous determination of representative proteolytic peptides and stable iso- 
tope labeled analogs. The principal limitation to widespread implementation of this 
approach is the availability of standard signature peptides in accurately known 
amounts. 

The two primary themes in proteomics are protein identification and the compari- 
son of protein expression levels in two physiological or pathological states (com- 
parative proteomics). The long term goal of being able to define the entire pro- 
teome of a cell is still unrealized, but the characterization of many thousands of 
proteins in a single analysis is now attainable. 

For proteomics to become a platform technology serving the emergent field of sys- 
tems biology, there is a pressing need for enhancement of quantification (Righetti, 
Eur J Mass Spectrom 10 (2004), 335-348). Most comparative proteomics studies 
deliver relative quantification, expressing the changes in amount of a protein in the 
context of a second cellular state (for example Dunkley, Mol Cell Proteomics 
(2004); Hoang, J Biomol Tech 14 (2003), 216-233; Ong, Mol Cell Proteomics 
(2002), 376-386). 

However, the goal must ultimately be to define the cellular concentrations of pro- 
teins absolutely, whether as molarities or as numbers of molecules per cell. Abso- 
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lute quantification, which poses one of the greatest challenges in proteomics, 
draws on well-established precepts in analytical chemistry, and requires either 
external standards or internal standards (Sechi, Curr Opin Chem Biol 7 (2003), 
70-77; Julka, J Proteome Res 3 (2004), 350-363). External standardization is typi- 
5 fied by immunodetection, whether solution phase or on position-addressable anti- 
body arrays (Walter, Trends Mol Med 8 (2002), 250-253; Lopez, J Chromatogr B 
Analyt Technol Biomed Life Sci 787 (2003), 19-27). The second approach, reliant 
on internal standardization, is based on mass spectrometry (MS), wherein highly 
selective detection of ions (or ion fragmentations) characteristic of the analytes of 
1 0 interest is combined with the use of internal standards. 

In the most rigorous MS analyses, stable isotopic variants of the analytes are used 
as internal standards. The key underlying principle is that the determination of 
relative signal intensities during mass spectrometric analysis can be converted 
15 into absolute quantities of analyte by reference to an authentic standard available 
in known amounts. Direct application of this approach to intact proteins is imprac- 
tical, and it is common to adopt the principle of surrogacy, that is to quantify indi- 
rectly by reference to a proteolytic peptide derived from the protein of interest. 

20 Analyses based on these principles have been dubbed "AQUA" (absolute quantifi- 
cation) using internal standards synthesized cfe novo by chemical methods (Ger- 
ber, PNAS 100 (2003), 6940-6945). However, this approach does not lend itself 
well to absolute quantification of large numbers of proteins, as each Q-peptide 
would need to be chemically synthesised and independently quantified. 

25 

The international patent application PCT/US03/17686, published as WO 
03/102220 provides methods to determine the absolute quantity of proteins pre- 
sent in a biological sample. The principle of WO 03/102220 is based on the gen- 
eration of an ordered array of differentially isotopically tagged pairs of peptides, 
30 wherein each pair represents a unique protein, a specific protein isoform or a spe- 
cifically modified form of a protein. One element of the peptide pairs is a syntheti- 
cally generated, external standard and the other element of the pair is a peptide 
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generated by enzymatic digestion of the proteins in a sample mixture. For per- 
forming the method of WO 03/102220 the standard peptides are calibrated so that 
absolute amounts are known and added for comparison and quantification, A 
sample of interest is also labelled with the same isotope tag as used for the stan- 
5 dard peptides except differing in the isotopic label. The pairs of signals, which cor- 
respond to differentially labelled sample and standard peptides are finally ob- 
served and related to a list of expected masses based on the particular standard 
peptides included. The disadvantage of WO 03/102220 is that standard peptides 
need to be individually synthesised, purified and quantified. Moreover, both the 
10 sample and the standard peptides need to be specifically labelled separately, in- 
creasing the potential for variability between experiments. 

Thus there is still an existing need to develop easy and convenient methods for 
absolute quantification in proteomics. 

15 

Summary of the invention 

The present invention is directed to an artificial protein for quantitative analysis of 
the proteome of a sample, cell or organism, comprising: 
20 (a) at least two consecutive peptides linked by a cleavage sequence for sepa- 
rating the peptides; 

(b) a singular marker on one or more peptides for determination of the absolute 
amount of the protein; and 

(c) N-terminal and C-terminal extensions for protection of the peptides; 

25 wherein each peptide represents one single protein of the sample, cell or organ- 
ism and each peptide is in a defined stoichiometry. 

For the purpose of the invention the artificial protein is also named QCAT protein 
and the peptides used for the QCAT protein are called Q-peptides. 



30 



The cleavage sequence between two Q-peptides may be an enzymatic or a 
chemical cleavage sequence. 
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The N-terminal and C-terminal extensions protect the quantification peptides from 
processing and exoproteolysis. The artificial protein further includes features 
which allow for easy purification of the Q-CAT protein. 

5 Each of the Q-peptides represents one single protein of the sample, cell or organ- 
ism and each peptide is in a defined stoichiometry, which is typically, but not ex- 
clusively 1:1. 

The present invention further concerns a collection of Q-peptides, which covers 
10 the complete proteome of an organism. This collection allows for rapid quantifica- 
tion of the proteome of such an organism. 

The present invention also concerns a vector comprising the QCAT protein and a 
kit comprising the vector and/or the QCAT protein. 

Moreover, the invention is directed to a method for quantitative analysis of the pro- 
teome of a sample, cell or organism, comprising the steps of: 

(a) quantifying the amount of the protein or one peptide containing the 
singular marker in an absolute manner; 

(b) generating a preparation of the proteins to be quantified; 

(c) mixing the products of steps (a) and (b); 

(d) completely cleaving the artificial protein of any of claims 1-10 and 
the proteins to be quantified in step (b) at the cleavage sequence; 

(e) determining the quantitative amount of peptides; 

(f) calculating the absolute amount of peptides, 
wherein the artificial protein and/or the peptides are isotopically labelled. 

It is important to note that the proteins do not have to be purified - partially puri- 
fied followed by high resolution separation technologies would be equally accept- 
30 able. 



15 



20 



25 
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Brief description of the drawings 

Figure 1 shows examples of Q-peptides selected as signature peptides; 

Figure 2 shows the DNA sequence, translated protein sequence and features of 

the QCAT; 

5 Figure 3 shows the Characterisation of the QCAT protein and Q-peptides; 
Figure 4 shows the Quantification using the QCAT protein, and 
Figure 5 shows the Use of the QCAT for muscle protein quantification. 

1 0 Detailed description of the invention 

The inventors describe here the design, expression and use of artificial proteins 
that are concatamers of tryptic Q-peptides for a series of proteins, generated by 
gene design de novo. The artificial protein, a concatamer of Q-peptides ("QCAT") 
is designed to include both N-terminal and C-terminal extensions. The function of 
15 the extensions is to protect the true Q-peptides, to introduce a purification tag 
(such as a His-tag) and a sole cysteine residue for quantification of the QCAT. 

The novel gene is inserted into a high-level expression vector and expressed in a 
heterologous expression system such as E co//. Within the QCAT protein, each 
20 Q-peptide is in a defined stoichiometry (typically, but not exclusively 1:1), such that 
the entire set of concatenated Q-peptides can be quantified in molar terms by de- 
termination of the QCAT protein. Moreover, the QCAT protein is readily produced 
in unlabelled or labelled form by growth of the expression strain in defined medium 
containing the chosen label. 

25 

The inventors have successfully designed and constructed an artificial gene en- 
coding a concatenation of tryptic peptides (a QCAT protein) from over 20 proteins. 
The protein further includes features for quantification and purification. The artifi- 
cial protein was expressed in E.coli and synthesis of the correct product was 
30 proven by mass spectrometry. The QCAT protein is readily digested with trypsin; 
is easily quantified and can be used for absolute quantification of proteins. This 
strategy brings within reach the accurate and absolute quantification of large 
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numbers of proteins in proteomics studies. Additionally, the QCAT was labelled by 
selective incorporation of a stable isotope labelled amino acid or by incorporation 
of 15 N nitrogen atoms at every position in the protein. 



5 It is therefore an object of the present invention to provide an artificial protein for 
quantitative analysis of the proteome of a sample, cell or organism, comprising: 

(a) at least two consecutive peptides linked by a cleavage sequence for sepa- 
rating the peptides; 

(b) a singular marker on one or more peptides for determination of the absolute 
1 0 amount of the protein; and 

(c) N-terminal and C-terminal extensions for protection of the peptides; 
wherein each peptide represents one single protein of the sample, cell or organ- 
ism and each peptide is in a defined stoichiometry. 



15 In a preferred embodiment of the invention, the artificial protein comprises 10 - 
200 peptides, preferably 10-100 peptides, more preferred 20 - 60 peptides, 
most preferred 60 peptides. It is possible to include multiple instances of specific 
peptides to modify the stoichiometry. 

20 In yet another embodiment, the cleavage sequence is cleaved by a protease, 
preferably by trypsin and the singular marker is a cysteine residue. 

Further, one or more peptides of the artificial protein are repeated identically at 
least one time, preferably one or more times, to achieve a particular stoichiometry 
25 between all peptides sequences. 



It is preferred to include an affinity tag (e.g. a histidine tag) for purification of the 
protein. It is particularly preferred to include the affinity tag in either the N-terminal 
or C-terminal extensions of the protein. 

In a further embodiment, the protein is labelled by an isotope, which is selected 
from the group consisting of 13 C, 15 N, 2 H and 1s O. 
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In yet a further embodiment each peptide comprises between about 3 and 40 
amino acids, preferably about 15 amino acids. 

5 The protein may comprise a molecular weight of about 10 - 300 kDa, preferably 
about 150 - 200 kDa, most preferred about 150 kDa and is preferably expressed 
in £. coli. 

The origin of the proteome, i.e. the sample, cell or organism is preferably a 
10 mouse, rat, ape or human, but can be from any proteinaceous source. 

In a preferred embodiment of the invention, the peptides represent different con- 
formational, metabolic or modification states of the protein, in order to quantify all 
proteins derived posttranslationally from such a protein. 

15 

Further, the peptides of the artificial protein have preferably a defined molecular 
weight distribution and quantitative ratios. A mass spectrometer can be calibrated, 
preferably with molecular weights, which result in equidistant mass spectroscopy 
signals. Preferably, one or more peptides are represented twice in order to unam- 
20 biguously label a reference molecular weight for calibration. 

The invention is further directed to a collection of peptides, as defined in step (b) 
of claim 1, which covers the complete proteome of an organism. All expressed 
proteins of an organism are defined as the proteome of such organism. This al- 
25 lows for rapid quantification of the proteome of such organism. 

Further included in the disclosure of the present invention is the absolute quantifi- 
cation of the protein levels of a certain reference strain, which can then be used to 
compare the protein levels of this particular strain or similar strains under varying 
30 experimental conditions. 
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The invention is also directed to a vector comprising a nucleic acid encoding the 
artificial protein and a kit comprising the artificial protein and/or the nucleic acid 
encoding the artificial protein. Further the invention is directed to a method for 
quantitative analysis of the proteome of an organism, which is described in detail 
5 above. 

The present invention will be better understood by the encompassed examples 
and results with reference to the accompanying figures. 

10 

Detailed description of the figures 

Figure 1. Examples of Q-peptides selected as signature peptides. 

For a series of proteins, peptides were selected (using multiple criteria) from pro- 
15 teins identified as the abundant proteins in a soluble fraction of chicken skeletal 
muscle, and assembled into an artificial protein, or Q-cat Left side: Coomassie 
blue stained, SDS-PAGE analysis of soluble .chicken muscle proteins. Center: 
MALDI-ToF spectrum of tryptic digestion of gel slices corresponding to selected 
protein bands. The peptide ion labelled with an oval represents the peptide cho- 
20 sen for inclusion in the Q-cat protein, Right side: mass and position of the indi- 
cated signature peptides within the designed Q-cat protein. Details of the peptides 
are in Table 1. 

Figure 2. The DNA sequence, translated protein sequence and features of 
25 the QCAT. 

The DNA sequence of the synthetic gene, with relevant cloning sites, is shown on 
the top line and the derived amino acid sequence is shown below. The grey 
blocked areas indicate the extent of the tryptic peptides, with the donor chicken 
proteins, tryptic peptide assignment (T1-T25) and the peptide mass (in Da) indi- 
30 cated. A non-cleavable Arg-Pro tryptic site within phosphoglycerate kinase (boxed) 
is included to confirm the non-digestibility of this site. Peptides (white boxes) en- 
code the initiator methione, N-terminal sacrificial sequence and spacer se- 
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quences, and are not derived from proteins of interest. The black boxes highlight 
the sequences carrying the unique cysteine residue for quantification and His 6 tag 
for purification. T1 and T2 are sacrificial peptides designed to protect the N- 
terminus of the first true Q-peptide (T3) 

5 

Figure 3. Characterisation of the QCAT protein and Q-peptides 

The pET21a/QCAT plasmid was transformed into E. coli DE3 cells and after a 
period of exponential growth, the expression of the QCAT was induced with IPTG. 
The cell lysates from pre-induced and induced cells were compared on SDS- 

10 PAGE (inset). After solubilization of the pellet, and affinity chromatography on a 
NiNTA column, the purified QCAT protein was homogeneous, and was digested in 
solution with trypsin. The peptides were analysed on MALDI-ToF mass spec- 
trometry. The inset tryptic digestion map is shaded to indicate the relative intensi- 
ties of signals corresponding to each peptide in the mass spectrum; peptides 

15 smaller than 900 Da, derived from the 'sacrificial' parts of the QCAT are less read- 
ily detected in this type of mass spectrometric analysis due to interfering ions. 



Figure 4. Quantification using the QCAT protein 

20 The QCAT protein was prepared in unlabelled form (L: "light") and in a form uni- 
formly labelled with 15 N (H: "heavy"). The H and L QCAT proteins were separately 
purified, quantified and mixed in different ratios, before tryptic digestion and 
measurement of peptide intensities by MALDI-ToF mass spectrometry. Panel a) 
illustrates the mass spectrum for the Q-peptide for adenylate kinase 

25 (GFLIDGYPR, 12 nitrogen atoms). In panel b) the measured L:H ratios were plot- 
ted relative to the mixture ratio, in a triplicate series of experiments for which indi- 
vidual points are shown. In the bottom panel, the data for seven peptides are col- 
lated and expressed as mean ± SD (n=18-21). The dotted line defines the 95% 
confidence limits of the fitted straight line. 



30 
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Figure 5. Use of the QCAT for muscle protein quantification. 

A preparation of soluble proteins from skeletal muscle of chicks at 1d and 27d was 
mixed with [ 15 N] QCAT, digested with trypsin and analysed by MALDI-ToF MS. For 
a subset of proteins, it was possible to determine the intensities of the endoge- 
5 nous and standard peptide, and from this, calculate the absolute amounts (in 
nmol/g tissue) of each protein. Three animals were used at each time point, error 
bars are SEM (n=3). Proteins were AK: adenylate kinase, ApoA1: apoliporotein 
A1, LDHB: lactate dehydrogenase B, Beta Trop: beta tropomyosin, Beta Eno: beta 
enolase, GP: glycogen phosphorylase, ALDO B: aldolase B, TPI: triose phosphate 
10 isomerase, GAPDH: glyceraldehyde 3-phosphate dehydrogenase, Actin, API: ac- 
tin polymerization inhibitor, PK: pyruvate kinase and CK: creatine kinase. 



1 . Design of the gene encoding the Q-protein concatamer 

1 5 One of the inventors' major interests is in proteome dynamics (Pratt, Mol Cell Pro- 
teomics 1 (2002), 579-591), and in changes in protein expression during muscle 
development (Doherty, Proteomics 4 (2004), 2082-2093; Doherty, Proteomics in 
press (2005)). A system that shows dramatic developmental changes in protein 
expression is the chicken pectoralis skeletal muscle from immediately post- 
20 hatching to maturity. Accordingly, for the demonstration QCAT set, the inventors 
chose twenty chicken proteins that had been previously identified as changing in 
expression level in developing skeletal muscle (Doherty, Proteomics 4 (2004), 
2082-2093). A single tryptic fragment was chosen to represent each protein (a "Q- 
peptide"), although a peptide that can be reproducibly generated by any prote- 
25 olytic or chemical fragmentation could be used, and in this example, Q-peptide 
selection was based on theoretical and experimental criteria. The first criterion 
was that the Q-peptides should lack a cysteine residue, as cysteine residue could 
be used for quantification of the QCAT and the absence of cysteines should avoid 
complex intra- and inter-molecular disulphide bond formation in the expressed 
30 protein. Secondly, the peptide chosen should be unique within the set of Q- 
peptides. Thirdly, the Q-peptides were chosen with masses between 1000 Da and 
2000 Da, corresponding to the region in MALDI-ToF mass spectra where sensitiv- 
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ity of detection is typically high and interfering signals are low. Finally, an opera- 
tional criterion was added, inasmuch as the inventors selected peptides that were 
already demonstrated to give a strong signal on MALDI-ToF mass spectrometry 
75% (15 out of 20) of which were Arg-terminated tryptic peptides - the propensity 
5 of such peptides to give stronger signals on MALDI-ToF mass spectrometry is well 
documented (Brancia, Electrophoresis 22 (2001), 552-559). A final, less important 
criterion was that the Q-peptides should contain at least one instance of an abun- 
dant and chemically refractory amino acid such as leucine or valine, as this would 
facilitate metabolic labeling with amino acids for the preparation of stable isotope 
1 0 labeled Q-peptides. The peptides are summarized in Table 1 : 



Table 1 - Peptides selected for Q-cat protein 





Peptide 
mass (Da) 


Sequence 


N 

atoms 


Parent protein 


T1 


405.2 


MAGK 


5 


Construct, sacrificial 


T2 


386.25 


VIR 


6 


Construct, sacrificial 


T3 


1036.52 


GFLIDGYPR 


12 


Adenylate kinase 


T4 


1601.87 


WLAYEPVWAIGTGK 


16 


Triose phosphate isome- 
rase 


T5 


1176.57 


NLAPYSDELR 


14 


Apolipoprotein A1 


T6 


1193.56 


GDQLFTATEGR 


15 


Myosin binding protein C 


T7 


1789.88 


SYELPDGQVITIGNER 


21 


Alpha actin 


T8 


1291.67 


QWESAYEVIR 


15 


Lactate dehydrogenase B 


T9 


1390.74 


LITGEQLGEIYR 


16 


Beta enolase 


T10 


1361.63 


ATDAESEVASLNR 


17 


Alpha tropomyosin 


T11 


1160.58 


SLEDQLSEIK 


12 


Myosin heavy chain (em- 
bryonic) 


T12 


1441.68 


VLYPNDNFFEGK 


15 


Glycogen phosphorylase 


T13 


1489.71 


GILAADESVGTMGNR 


19 


Aldolase B 


T14 


1345.64 


ATDAEAEVASLNR 


17 


Beta tropomyosin 


T15 


1687.8 


LQNEVEDLMVDVER 


19 


Myosin heavy chain 
(adult) 


T16 


1748.77 


LVSWYDNEFGYSNR 


19 


Glyceraldehyde 3- 
phosphate dehydro- 
genase 


T17 


1767.98 


ALESPERPFLAILGGAK 


21 


Phosphoglycerate kinase 


T18 


1249.65 


QWDSAYEVIK 


13 


Lactate dehydrogenase A 


T19 


1803.93 


AAVPSGASTGIYEALELR 


21 


Alpha enolase 
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T20 


1823.97 


LLPSESALLPAPGSPYGR 


21 


Actin polymerization in- 
hibitor 


T21 


1857.9 


FGVEQNVDMVFASFIR 


25 


Pyruvate kinase 


T22 


1991.95 


GTGGVDTAAVGAVFDISN 
ADR 


25 


Creatine kinase 


T23 


274.15 


AGK 


4 


Construct, sacrificial 


T24 


892.42 


VICSAEGSK 


10 


Construct, quantification 


T25 


1408.68 


LAAALEHHHHHH 


24 


Construct, purification tag 



Once the candidate set was nominated, the Q-peptides were assembled and a 
gene was constructed, which encoded the assembled Q-peptides using codons 
for maximal expression in E. coll At the C-terminus an extension was added to 
5 provide a cysteine residue and a His tag purification motif (the latter provided in 
this case by the vector pET21a). An additional series of amino acids was ap- 
pended to the N-terminus to provide an initiator methionine residue and a sacrifi- 
cial peptide, which when cleaved would expose a true Q-peptide (Figure 1). This 
avoided complications due to N-formylation or removal of methionine from the N- 

1 0 terminus of QCAT. The transcript encoded by the initial QCAT gene was then ana- 
lyzed in silico for features such as hairpin loops that might compromise translation. 
If such a feature was noted, the order of the Q-peptides was swapped until an ac- 
ceptable mRNA structure was obtained - the sequence of Q-peptides within a 
QCAT is not relevant to their use as quantification standards and the order is thus 

15 amenable to such manipulation. The gene was constructed from a series of over- 
lapping oligonucleotides and confirmed by DNA sequencing. 

2. Expression of the QCAT 

The QCAT gene was constructed with restriction sites, such that it could be in- 
20 serted into a range of expression vectors (Figure 2). In this instance, the gene 
(confirmed by DNA sequencing) was inserted into pET21a at the A/del and HindlW 
sites and was expressed initially in E.coli (NovaBlue (DE3)) grown in rich medium. 
After induction by IPTG, SDS-PAGE analysis confirmed high-level expression of a 
protein of the expected mass (-35 kDa). This protein was present in the insoluble 
25 fraction of sonicated cells, and was presumed to be the QCAT protein present in 
inclusion bodies. From this preparation we purified the QCAT protein by affinity 
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chromatography using Ni-NTA resin, which resulted in a homogeneous prepara- 
tion (Figure 2, inset). The intact average mass of this protein, measured by ESI- 
MS was 33036±2 Da (data not shown), compared to the predicted mass of 33167 
Da for the QCAT protein, a difference of 131 Da which is exactly consistent with 
5 loss of the methionine residue from the N-terminus. The approx. 35 kDa gel band 
was subjected to in-gel digestion with trypsin and analysed by MALDI-ToF MS. All 
predicted QCAT peptides were readily observed in the MALDI-ToF mass spec- 
trum, although the N and C-terminal sacrificial peptidic material yielded, by design, 
fragments that were too small to be seen in the MALDI-ToF mass spectrum (Fig- 

10 ure 3). Although the peptides were chosen to yield good signals on MALDI-ToF, 
some peptides were markedly less intense than others. These included all of the 
lysine-terminated peptides, with the exception of T17, which included a non- 
cleavable Arg-Pro site, which although lysine terminated still yielded a strong sig- 
nal on MALDI-ToF mass spectrometry. This was particularly evident in the iso- 

15 form-specific Q-peptides T8 and T18, derived from lactate dehydrogenases A and 
B (QWESAYEVIR and QWDSAYEVIK respectively) - the lysine terminated pep- 
tide was less than 10 % of the intensity of the arginine-terminated peptide, which 
suggests that either Q-peptides should be predominantly drawn from arginine ter- 
minated peptides, or alternatively, that a step such as guanidination should be 

20 used to convert lysine residues to homoarginine residues, enhancing the propen- 
sity to give strong signals (Brancia, Electrophoresis 22 (2001), 552-559). The 
QCAT was digested by trypsin very effectively and there was no evidence for par- 
tial proteolytic products of the Q-peptides, which would of course compromise the 
quantification step. The inventors then expressed the protein in minimal medium 

25 containing 15 NH 4 CI as sole nitrogen source. When digested with trypsin, the resul- 
tant MALDI-ToF mass spectrum was of high quality, and all Q-peptides were de- 
tectable at the appropriate mass shift corresponding to the number of nitrogen 
atoms in the peptide (data not shown). 

30 The unlabelled and 15 N-labelled QCAT proteins were then mixed in different ra- 
tios, and digested with trypsin before the resultant limit peptides were analysed by 
MALDI-ToF mass spectrometry. The heavy and light variants of the peptides were 
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readily discerned (Figure 3a) and their intensities measured for a series of pep- 
tides (Figure 3b). In all instances, the data were of high quality, and the relation- 
ship between proportion of material and the heavy:light ratios was linear, with a 
slope of one (mean ± SD (n=6) = 1.008 ± 0.008), and very high correlation coeffi- 
5 cients (r 2 greater than 0.99 in all instances). The combined data (Figure 3c) ex- 
presses the data for seven peptides; the close boundaries defined by the 95% 
confidence limits indicate the quality of the quantification. 

Summary of results 

10 The inventors have applied this particular QCAT in the analysis of protein expres- 
sion in chick skeletal muscle, at 1 d and 27 d post-hatching (Figure 4). Twelve pro- 
teins present in this preparation were also represented in the QCAT. MALDI-ToF 
data of the tryptic peptides was readily acquired, and the changes in protein levels 
that occur over the first three to four weeks post-hatching were determined. 

15 

Because the proteins were absolutely quantified, the inventors were able to ex- 
press the proteins as nmol per g wet weight of tissue. The variance of the triplicate 
analyses was small; the inventors attribute this variance to biological rather than 
analytical variation. The inventors have previously measured the levels of seven of 
20 these proteins by 2D gel electrophoresis and densitometry, and the correlation 
between the quantification using both methods was 0.82 (r 2 , p>0.001). Recogniz- 
ing that the two methods assess different representations of the proteome, such 
as charge-variant isoforms or total protein complement, and that the densitometirc 
method is inevitably imprecise, the correlation is good. 

25 

The inventors have demonstrated the feasibility of the QCAT approach for genera- 
tion of a concatenated set of Q-peptides. The QCAT, designed using both theo- 
retical and experimental considerations, was expressed at high levels, even when 
grown on minimal medium, and the product was successfully purified. Because 
30 the QCAT is a completely artificial construct, the inventors did not anticipate that it 
would fold into any recognizable three-dimensional structure, and as expected, the 
protein aggregated into inclusion bodies. This is an advantage as subsequent pu- 
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rification is simpler, only requiring resolubilization of the pellet in strong chaotropes 
prior to affinity purification. Further, the lack of higher order structure of the QCAT 
would ensure that the QCAT was digested at least as quickly as the target pro- 
teins to be quantified. 

5 

Use of the artificial protein and the QCAT peptides 

There are two ways in which the concatamers will be used. First, in the direct Q- 
peptide method, the concatamers are used as an internal standard. The stable- 
isotope labelled concatamer can be directly added to a sample or cell preparation 

10 before the proteolysis step. Alternatively, for some cell systems, the concatamer 
quantification can be used to achieve absolute quantification of a reference strain 
grown under carefully defined conditions - the indirect Q-peptide approach. This 
reference strain can then be used, in stable isotope labelled form, as an absolute 
quantification standard for all future proteomics quantification studies using that 

15 organism. Once a reference strain is accurately quantified, any peptide can be 
used to report on a protein, rather than the restricted set used as Q-peptides and 
this is clearly a very attractive proposition. This extends the generality of different 
proteomic strategies, and creates a new niche for tagging methods such as ICAT 
(Gygi, J Proteome Res 1 (2002), 47-54) and ITRAQ (Ross, Molecular and Cellular 

20 Proteomics in press (2004)) in a comparative proteomics analysis of an unknown 
against a fully quantified strain. 

However, the inventors recognize that there are many instances where this ap- 
proach is not appropriate, and where a stable isotope labelled concatamer itself 
25 (the direct Q-peptide approach) will be the appropriate standard. This is particu- 
larly apposite in proteomics studies using biological material that cannot be readily 
pre-labelled, for example, in animal tissues or in biomarker studies. 

Particular advantages 

30 The strategy the inventors advocate is superior to chemical synthesis of each indi- 
vidual Q-peptide, usually in a stable isotope form. Whilst chemical synthesis has 
been used in one-off applications, the process of peptide synthesis is not suffi- 
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ciently 'clean' as to obviate exhaustive purification of the product. Secondly, for 
multiplexed assays, each peptide would need to be individually quantified before 
use. Finally, chemically synthesized Q-peptides are a finite resource whereas re- 
peated expression of the QCAT gene is facile. High quality absolute quantification 
5 may be an effective route to overcome the difficulties associated with current 
methods for comparative proteomics, whether based on gel analysis or mass 
spectrometry. A series of comparative studies of particular cellular systems, each 
by comparison to a QCAT quantified reference would not only be individually 
quantified, but should be sufficiently rigorous that as the data sets grow, any pair- 
10 wise comparison would be robust, transferable between individual laboratories 
and stable over time. 

It should be possible to factor in the propensity of the peptide to ionize and gener- 
ate a good signal in the mass spectrometer. At present, ion intensities are not 
15 used exhaustively in the analysis of mass spectra, although there have been 
some recent attempts to predict intensity using knowledge-based approaches 
(Krause, Anal Chem 71 (1999), 4160-4165; Gay, Proteomics 2 (2002), 1374- 
1391; Baumgart, Rapid Commun Mass Spectrom 18 (2004), 863-868). 

20 An additional factor that must be taken into account is the choice of precursor la- 
bel. Whilst uniform labeling with [ 13 C] or [ 15 N] ensures that every peptide is com- 
prehensively labeled, it might be preferable to select Q-peptides that each contain 
the same amino acid that is then used as the stable isotope labeled precursor. 
Since most QCAT proteins would be anticipated to be assemblies of tryptic pep- 

25 tides, a strategy of incorporation of [ 13 C 6 ]-lysine and [ 13 C 6 ]-arginine would also en- 
sure that most Q-peptides would be singly labeled and the mass offset between 
heavy and light peptides would be a constant 6 Da. Further, an unlabelled QCAT 
could be labeled in vitro using reagents advocated for comparative proteomics, 
enhancing all of these technologies to absolute quantification. 

30 

Without being bound by any particular theory, the inventors believe that the num- 
ber of Q-peptides that could be assembled into a single QCAT is limited by the 
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ability to achieve high-level heterologous expression of large proteins. In the ex- 
ample given here, the inventors chose 20 peptides of average length 15 amino 
acids and average molecular weight 1.5 kDa. If 100 proteins were represented in 
a single QCAT, the resultant recombinant protein would be 150 kDa, which should 
5 be readily expressed. The entire yeast proteome, of approx 6000 proteins, could 
then be defined within approx. 60 QCAT constructs. 

This ability to quantify as many as 100 proteins in a single construct invites the 
challenge of optimal assembly of individual Q-peptides in QCATs. Different criteria 

10 might be envisaged. First, a group of Q-peptides would allow absolute quantifica- 
tion of a particular subcellular fraction, or of a specific subset of proteins, for ex- 
ample transcription factors or protein kinases. Secondly, and perhaps more impor- 
tantly, concatenation could be driven by the abundance of the target proteins in 
the cell. It would be difficult to quantify two different proteins with widely different 

1 5 expression levels in the same QCAT experiment, and it might be preferable to as- 
semble high abundance proteins in a construct distinct from that encoding low 
abundance proteins. 

Other applications for QCATs are readily envisaged, particularly in the broad ar- 
20 eas of clinical biology and toxicology and diagnostics. Absolute quantification will 
add a new dimension to the predictive values of analyses in clinical or other bio- 
marker monitoring systems and will absolutely define the stoichiometric ratios of 
individual proteins within a subcellular compartment or a multi-protein complex. 

25 

Materials and methods 

Materials. [ 16 N]H 4 CI (99% atom percent excess) was provided by CK Gas Prod- 
ucts Ltd. Hampshire, UK. Most reagents, except where listed here, have been de- 
scribed previously (Doherty, Proteomics 4 (2004) 2082-2093; Pratt, Proteomics 2 
30 (2002), 157-160). Chick (layer, Hi-Sex Brown) skeletal muscle proteins were pre- 
pared from 1 d and 27 d old chicks as a 20.000g supernatant of a 10 % (w/v) ho- 
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mogenate (Doherty, Proteomics 4 (2004) 2082-2093; Doherty, Proteomics 5 
(2005) 5, 522-533. 

QCAT gene design and construction. Q-peptides were selected for uniqueness 
5 of mass, propensity to ionise and be detectable in mass spectrometry, the pres- 
ence of specific amino acid residues (for example, leucine or valine), the absence 
of other amino acid residues (cysteine, histidine, methionine). The peptide se- 
quences were then randomly concatenated in silico and used to direct the design 
of a gene, codon-optimised for expression in E. coli. The predicted transcript was 

10 analysed for RNA secondary structure that might diminish expression, and if this 
was present, the order of the peptides was altered. N-and C-terminal sequences 
were added as sacrificial structures, protecting the assembly of true Q-peptides 
from exoproteolytic attack during expression. Additional peptide sequences were 
added to provide an initiator methionine and a C-terminal cysteine residue for 

15 quantification. The artificial gene was synthesised de novo (by Entelechon GmbH, 
Germany) from a series of overlapping oligonucleotides, verified by DNA sequenc- 
ing and ligated into the A/del and Hind III sites of the pET21a expression vector, to 
yield the QCAT plasmid, pET21a QCAT. A His 6 purification tag was provided by 
fusion to the vector. 

20 

QCAT gene expression and labeling with 16 N. The QCAT plasmid, pET21a 
QCAT, was used to transform NovaBlue (DE3) (K-12 endA1, hsdR17(r K i2m K i2*)> 
supE44, thi-1, recA1, gyrA96, relA1 } lac, F[proA + B + , \acfZUM15 ::Tn70(Tc R )] cells 
to ampicillin resistance. Cells were grown at 37 °C in Luria broth, 100 |jg/ml am- 

25 picillin to an A 6 oo of 0.4-0.6 and IPTG added to 1 mM. Incubation continued for a 
further five hours when cells were pelleted by centrifugation (5.000g, 4 min., 4°C), 
resuspended in 10 ml 20 mMTris/HCI buffer, pH 8.0 and lysozyme was added 
(100 \jg/m\) for ten minutes at room temperature. Cells were then sonicated (three 
bursts of 30s) on ice and centrifuged at 14000 g for 10min. Pellets and super- 

30 natants of induced and uninduced cultures were analysed by 12,5 % (w/v) SDS 
PAGE/Coomassie blue staining. For 15 N-labelling, cells were grown in M9 minimal 
medium prepared using [ 15 N]H 4 CI (20 mM), and induced and processed as above. 
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Purification and analysis of the QCAT protein. The pellets from sonicated cells 
were dissolved in 20 mM phosphate buffer (pH 7.5) containing 20 mM imidazole 
and 8 M urea (Buffer A) before being applied to a NiNTA column (GE Healthcare). 
5 After 10 column volumes of washing in the same buffer, the bound material was 
eluted with buffer A with an increased concentration of imidazole (500 mM). This 
material was desalted on Sephadex G25 'spun columns' and the mass of the 
eluted protein was determined by electrospray ionisation mass spectrometry using 
a Waters-Micromass Q-ToF micro mass spectrometer. The mass spectra were 

10 processed using the MaxEnt I algorithm. The purified desalted protein was di- 
gested with trypsin, and the resultant peptides were mass measured using a Wa- 
ters-Micromass MALDI-ToF mass spectrometer (Doherty, Proteomics 4 (2004) 
2082-2093). To assess the response ratio of heavy and light variants of the 
QCAT, the purified 'heavy' and light' proteins were mixed in different ratios prior to 

15 digestion with trypsin and MALDI-ToF mass spectrometry. The intensities of the 
[ 14 N]- and [ 15 N]- peptides were measured on centroided spectra. 

Use of the QCAT to quantify muscle protein expression. The supernatant frac- 
tion containing chicken soluble proteins derived from 100mg of tissue was mixed 

20 with 290 \sg of [ 15 N] QCAT, quantified by protein assay and digested with trypsin 
overnight - the QCAT was digested at a higher rate than endogenous muscle pro- 
teins (results not shown). The experiment was replicated for three animals at each 
time point. Subsequently, the [ 14 N]- (muscle) and [ 15 N]- (QCAT) peptides were 
identified by mass, and their relative intensities measured by MALDI-ToF mass 

25 spectrometry. 

Mass Spectrometric Analysis 

The analysis in this example was carried out using a MALDI-ToF Mass Spec- 
trometer, but this method is equally applicable to all other Mass Spectrometric 
30 methods suitable for the analysis of peptides. 
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Claims 

1 . Artificial protein for quantitative analysis of the proteome of a sample, cell or 
organism, comprising: 

5 (a) at least two consecutive peptides linked by a cleavage sequence for 

separating the peptides; 

(b) a singular marker on one or more peptides for determination of the 
absolute amount of the protein; and 

(c) N-terminal and C-terminal extensions for protection of the peptides; 
10 wherein each peptide represents one single protein of the sample, cell or 

organism and each peptide is in a defined stoichiometry. 

2. Artificial protein of claim 1, wherein the protein comprises 10 - 200 pep- 
tides, preferably 10-100 peptides, more preferred 20 - 60 peptides, most 

1 5 preferred 60 peptides. 

3. Artificial protein of claim 1 or 2, wherein the cleavage sequence is cleaved 
by a protease, preferably by trypsin. 

20 4. Artificial protein of any of the preceding claims, wherein the singular marker 
is a cysteine residue. 

5. Artificial protein of any of the preceding claims, wherein one or more pep- 
tides are repeated identically one or more times, in order to achieve a par- 

25 ticular stoichiometry between all peptide species. 

6. Artificial protein of any of the preceding claims, wherein the protein com- 
prises an affinity tag for purification of the protein. 



30 7. 



Artificial protein of any of the preceding claims, wherein the protein is la- 
belled by an isotope. 
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8. Artificial protein of any of the preceding claims, wherein each peptide com- 
prises between about 3 and 40 amino acids, preferably about 15 amino ac- 
ids. 

9. Artificial protein of any of the preceding claims, wherein the peptides repre- 
sent different conformational, metabolic or modification states of the pro- 
tein. 

10. Artificial protein of any of the preceding claims, wherein the peptides have a 
defined molecular weight distribution and quantitative ratios. 

11. A collection of peptides as defined in step (b) of claim 1 , which covers the 
complete proteome of an organism to allow the rapid quantification of the 
proteome of such an organism. 

12. Vector comprising a nucleic acid encoding the artificial protein of any of 
claims 1-10. 

13. Kit comprising the vector of claim 12 and/or the artificial protein of any of 
claims 1-10. 

14. A method for quantitative analysis of the proteome of a sample, cell or or- 
ganism, comprising the steps of: 

(a) quantifying the amount of the protein or one peptide containing the 
singular marker in an absolute manner; 

(b) generating a preparation of the proteins to be quantified; 

(c) mixing the products of steps (a) and (b); 

(d) completely cleaving the artificial protein of any of claims 1 - 10 and 
the proteins to be quantified in step (b) at the cleavage sequence; 

(e) determining the quantitative amount of peptides; 

(f) calculating the absolute amount of peptides, 

wherein the artificial protein and/or the peptides are isotopically labelled. 
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